Nonlinear Dynamics of the Perceived Pitch of Complex Sounds 
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We apply results from nonlinear dynamics to an old problem in acoustical physics: the mechanism of the 
perception of the pitch of sounds, especially the sounds known as complex tones that are important for music 
and speech intelligibility. 



PACS numbers: 05.45.-a, 43.66.+y, 87.19.La 

The pitch of a sound is where we perceive it to lie on a 
musical scale. Like all sensations, pitch is a subjective quan- 
tity related to physical attributes of the stimulus, in this case 
mainly its component frequencies. For a pure tone with a sin- 
gle frequency component, the relation is monotonic and en- 
ables us to adopt an operational definition of pitch in terms 
of frequency: the pitch of an arbitrary sound is given by the 
frequency of a pure tone of the same pitch. The scientific in- 
vestigation of pitch has a long history dating back to Pythago- 
ras, but the origin of the pitch of complex tones with several 
frequency components is still not well understood. The first 
perceptual theories considered pitch to arise at a peripheral 
level in the auditory system more recently it has been 

thought that central nervous system processing is necessary 
|^-||]. However, the experimental evidence is that this pro- 
cessing is carried out before the primary auditory cortex [^]. 
The latest models integrate neural and peripheral processing 
[ Jujjnj ]. Here we develop a nonlinear theory of pitch percep- 
tion for complex tones that describes experimental results on 
the pitch perception of complex sounds at least as well as do 
current models, and that removes the need for extensive pro- 
cessing at higher levels of the auditory system. 

A key phenomenon in pitch perception is known as the 
problem of the missing fundamental, virtual pitch, or residue 
perception fll2|], and consists of the perception of a pitch that 
cannot be mapped to any frequency component of the stimu- 
lus. Suppose that a periodic tone such as that shown in Fig. |l]a 
is presented to the ear. Its pitch is perceived to be that of a 
pure tone at the frequency of the fundamental. The number 
of higher harmonics and their relative amplitudes give tim- 
bral characteristics to the sound, which allow one to distin- 
guish a trumpet from a violin playing the same musical note. 
Now suppose that the fundamental and some of the first few 
higher harmonics are removed (Fig. [j]b). Although the timbre 
changes, the pitch of the tone remains unchanged and equal to 
the missing fundamental; this is residue perception. 

The first physical theory for the residue is due to von 
Helmholtz [S], who attributed it to the generation of differ- 
ence combination tones in the nonlinearities of the ear. A 
passive nonlinearity fed by two sources with frequencies u>i 
and uj2 generates combination tones of frequency u>c, which 
are nontrivial solutions of the equation pu>i + quo2 + <^c = 0, 
where p and q are integers. For a harmonic complex tone 



(Fig. |l|b), the difference combination tone u>c = oJq. — &i 
(i.e., p = 1, q = —1) between two successive partials has 
the frequency of the missing fundamental uio, that is uic = 
(k + 1)u>q — ku>o = ujq. However, a crucial experiment seri- 
ously challenged nonlinear theories of the residue. Schouten 
et al. [Q] demonstrated that the behavior of the residue cannot 
be described by a difference combination tone: if we shift all 
the partials by the same amount Aw (Fig. ^jp), the difference 
combination tone remains unchanged, and the same should 
thus be true of the residue. Instead it is found that the per- 
ceived pitch also shifts, showing a linear dependence on Alu 
(Fig. [j]d). This phenomenon is known as the first pitch shift 
effect, and has been accurately measured in many experiments 
(psychoacoustic experiments on pitch can attain an accuracy 
of 0.2% [|l3|]). A first attempt to model qualitatively the be- 
havior of the pitch shift shows that the slopes of the lines in 
Fig. |l]d depend roughly on the inverse of the harmonic num- 
ber k + 1 of the central partial of a three-component complex 
tone. However, for small k at least, the change in slope is 
slightly but consistently larger than this, but smaller if we re- 
place k + 1 by k. This behavior is known as the second pitch 
shift effect. Also, an enlargement of the spacing between par- 
tials while maintaining fixed the central frequency produces 
a decrease in the residue pitch. As this anomalous behavior 
seems to be correlated with the second pitch shift effect, it 
is usually included within it [|]]. Pitch-shift experiments, and 
others that demonstrated that the residue is elicited dichoti- 
cally (part of the stimulus exciting one ear and the rest of the 
stimulus the other, contralateral, ear) and not just monotically 
and diotically (all of the stimulus exciting one or both ears) 
led to the abandonment of peripheral (periodicity and 
place [|llj^]) theories and to the development of theories that 
considered the pitch of complex tones to be a result of central 
nervous system processing [p|-p||; thence to integrated neural 
and peripheral models JTc| , |lip . 

We demonstrate below that the crucial pitch-shift experi- 
ment of Schouten et al. can be accurately described in terms 
of generic attractors of nonlinear dynamical systems, such that 
a theory of pitch perception for complex tones can be con- 
structed without the need to resort to extensive central pro- 
cessing. We model the auditory system as a generic nonlinear 
forced oscillator, and identify experimental data with struc- 
turally stable behavior of this class of dynamical system. We 
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emphasize that since we are interested in universal behavior, 
the results we obtain are not dependent on the construction 
of a particular model, but rather represent the behavior of any 
dynamical system of this type. 
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FIG. 1. Fourier spectra and pitch of complex tones. Whereas 
pure tones have a sinusoidal waveform corresponding to a single fre- 
quency, almost all musical sounds are complex tones that consist of 
a lowest frequency component, or fundamental, together with higher 
frequency overtones. The fundamental plus overtones are together 
collectively called partials. a A harmonic complex tone. The over- 
tones are successive integer multiples k = 2, 3, 4 ... of the funda- 
mental luq that determines the pitch. The partials of a harmonic com- 
plex tone are termed harmonics, b Another harmonic complex tone. 
The fundamental and the first few higher harmonics have been re- 
moved. The pitch remains the same and equal to the missing fun- 
damental. This pitch is known as virtual or residue pitch, c An 
anharmonic complex tone. The partials, which are no longer har- 
monics, are obtained by a uniform shift Alj of the previous har- 
monic case (shown dashed). Although the difference combination 
tones between successive partials remain unchanged and equal to the 
missing fundamental, the pitch shifts by a quantity AP that depends 
linearly on Alu. d Pitch shift. Pitch as a function of the central fre- 
quency / = (k + l)wo + Alu of a three-component complex tone 
{kcuo+Aoj, (k+l)u) +Au>, (fe+2)wo+Aw}. The pitch-shift effect 
is shown here for k = 6,7, and 8. Three-component complex tones 
are often used in pitch experiments because they elicit a clear residue 
sensation and can easily be obtained by amplitude modulation of a 
pure tone of frequency / with another pure tone of frequency luq. 
When cuo and / are rationally related, Auj = 0, and the three fre- 
quencies are successive multiples of some missing fundamental. At 
this point AP = 0, and the pitch is luo, coincident with the frequency 
of the missing fundamental. 



Since complex tones can be decomposed as a series of 
partials — a sum of purely sinusoidal components — and 
residue perception is elicited with at least two of these, we 
search for suitable attractors in the class of two-frequency 
quasiperiodically forced oscillators. Quasiperiodically forced 
oscillators show a great variety of qualitative behavior that 
falls into the three categories of periodic attractors, quasiperi- 
odic attractors, and strange attractors (both chaotic and non- 
chaotic). We propose that the residue behavior we seek to 
explain is a resonant response to forcing the auditory system 
quasiperiodically. From stability arguments fli"4)-|T7|] we sin- 
gle out a particular type of two-frequency quasiperiodic attrac- 
tor, which we term a three-frequency resonance, as the natural 
candidate for modelling the residue. Three-frequency reso- 
nances are given by the nontrivial solutions of the equation 
■pijj\ +qu>2 +TUJR = 0, where p, q, and r are integers, lui and oj 2 
are the forcing frequencies, and lor is the resonant response, 
and can be written compactly in the form (p, q, r). Notice that 
combination tones are three-frequency resonances of the re- 
stricted class (p, q,l ). This is the only type of response possi- 
ble from a passive nonlinearity, whereas a dynamical system 
such as a forced oscillator is an active nonlinearity with at 
least one intrinsic frequency, and can exhibit the full panoply 
of three-frequency resonances, which include subharmonics 
of combination tones. The investigation of three-frequency 
systems is a young area of research, and there is not yet any 
consistency in the nomenclature of these resonances in the sci- 
entific literature: as well as three-frequency resonances, they 
are also called weak resonances or partial mode lockings; see 
Baesens et al. [ |l"8[ | and references therein. It is known that 
three-frequency resonances form an extensive web in the pa- 
rameter space of a dynamical system. In particular, between 
any two parent resonances (pi, gi,ri) and (p2, (fe, r 2 ) lies the 
daughter resonance (pi + P2,qi + 02, ri + r 2 ) on the straight 
line in parameter space connecting the parents. 

For pitch-shift experiments, the vicinity of the external fre- 
quencies to successive multiples of some missing fundamen- 
tal ensures that (k + l)/fc is a good rational approximation to 
their frequency ratio. Hence we concentrate on a small inter- 
val around the missing fundamental between the frequencies 
oji/k and w 2 /(A: + 1), which correspond to the resonances 
(0, —1, k) and (—1, 0, k + 1). We suppose that the residue 
should be associated with the largest resonance in this inter- 
val. In numerical simulations and experiments with electronic 
oscillators — [Tv|] , we have confirmed that for small nonlin- 
earity the daughter of these resonances, (— 1, — 1, 2k + 1), is 
the resonance of greatest width between its parents. Hence we 
propose that the three-frequency resonance formed between 
the two lower-frequency components of the complex tone and 
the response frequency P = (u>i + LU2)/{2k + 1) gives rise 
to the perceived residue pitch P. Since in pitch-shift exper- 
iments the external frequencies are lui = ku + Alu and 
lu 2 = (k+ l)ou + Aui, the shift of the response frequency with 
respect to the missing fundamental is AP = 2Acj/(2fc +1). 
This equation gives a linear dependence of the shift on the 
detuning Alu, in agreement with the first pitch shift effect. 
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FIG. 2. Experimental data with our predictions superimposed 
show pitch as a function of the central frequency / = (k+l)u>o+ Au> 
for a three-component complex tone {f — g, f, f + g}, for k = 6, 
7, 8, 9, 10, and 11. The component spacing is g — ujq — 200 Hz. 
Circles, triangles and dots represent experimental data for three dif- 
ferent subjects (from Schouten et al. |^|). The perceived pitch shifts 
linearly with the detuning Au). The pitch shift effect we predict is 
shown superimposed on the data as solid lines satisfying the equa- 
tion P = g + 2(f - (k + l)g)/(2k + 1). The lines describe the 
behavior of the response frequency in a three-frequency resonance 
formed with the two lowest-frequency components of the stimulus. 
The lines agree with the psychophysical data without any fitting pa- 
rameters. 

In Fig. [| we have superimposed the behavior of the cor- 
responding three-frequency resonances on data for the pitch 
shift for three different subjects. There is good agreement, 
which explains both the first pitch shift effect and the first 
aspect of the second pitch shift effect, because the predicted 
slope is l/(k + 1/2), between 1/k and l/(k + 1). The second 
aspect of the second pitch shift effect can be interpreted as 
follows: the term 2Au> in the equation for AP arises from the 
two equal contributions Aui obtained by a uniform shift in the 
two forcing frequencies. If, while maintaining U02 fixed, we 
increase the interval to wi — we enlarge the spacing between 
successive partials — the first contribution remains constant 
and equal to Auj while the second diminishes, to give a de- 
crease in the response frequency of the resonance and thus in 
the residue. 

For small harmonic numbers k our model data fall within 
experimental error bars for most of the pitch-shift experiments 
reported [p], ^9|j20| ] . For larger k there are systematic devia- 
tions of the residue slopes which become shallower than theo- 
retical predictions of both our theory, and the central theories. 
An accepted explanation for this is that difference combina- 
tion tones generated as a consequence of passive nonlinear 
mixing in the auditory periphery can play the role of the low- 
est frequency components of the stimulus. The same explana- 
tion is also valid for our approach. Our goal here, however, 
is simply to demonstrate that physical frequencies other than 
combination tones can accurately describe residue behavior in 



nonlinear terms without fitting parameters, and can overcome 
the objections against nonlinear theories raised by pitch-shift 
experiments. 

Dichotic perception has been another argument against 
nonlinear theories, because it has been thought to imply cen- 
tral nervous system processing [pT|]. However, subcortical 
nervous paths, for example through the superior olivary com- 
plex, allow the exchange of information between both — 
left and right — peripheral auditory systems. Moreover, fre- 
quency information up to 5 kHz is preserved until at least this 
point in the auditory system p^ , ^3| . These two factors al- 
low frequency components of the stimulus that arrive at dif- 
ferent ears to interact at a subcortical level. This implies that 
three-frequency resonances can be generated, not only monot- 
ically but also dichotically, by a mechanism such as that which 
we are proposing. Experimental evidence for the existence of 
multifrequency resonant responses is given by electrophysio- 
logical records in single units of the auditory midbrain nucleus 
of the guinea fowl [^4j. Our mechanism is also consistent 
with neuromagnetic measurements performed on human sub- 
jects showing that pitch processing of complex tones is carried 
out before the primary auditory cortex ^ . 

Thanks to the residue we can appreciate music in a small 
radio with negligible response at low frequencies; but it is 
not just an acoustical curiosity. The residue seems to play an 
important role in music perception and speech intelligibility. 
The importance of the residue for the perception of musical 
sounds has long since been recognized. It has been proposed 
||] that residue perception is at the heart of the fundamental 
bass of Rameau [^5|]. As the fundamental bass and its more 
modern counterparts form a key element for the melodic and 
harmonic structuration of musical sounds, a physical basis for 
the residue may contribute to the construction of an objec- 
tively grounded theory of music [|l4||. As for speech intelligi- 
bility, hearing aids that furnish fundamental frequency infor- 
mation produce better scores in profoundly hearing impaired 
subjects than simple amplification [^o|. It is clear that a better 
knowledge of the basic mechanisms involved in pitch percep- 
tion will allow a similar improvement in its applications to 
technology and medicine. 

The principle of economy, widespread in nature, suggests 
that by means of our proposed mechanism, complex tones 
may be preprocessed at a subcortical level, to feed optimized 
pitch candidates to more central zones of the nervous system, 
freeing them of an enormous quantity of calculation. In this 
way, the nonlinear behavior of the auditory periphery may 
greatly contribute to explain the astonishing real-time analyt- 
ical capabilities of the auditory system J27|]. 
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